cancer cell line
Scaffold Splits Overestimate Virtual Screening Performance
Guo, Qianrong, Hernandez-Hernandez, Saiveth, Ballester, Pedro J
Virtual Screening (VS) of vast compound libraries guided by Artificial Intelligence (AI) models is a highly productive approach to early drug discovery. Data splitting is crucial for better benchmarking of such AI models. Traditional random data splits produce similar molecules between training and test sets, conflicting with the reality of VS libraries which mostly contain structurally distinct compounds. Scaffold split, grouping molecules by shared core structure, is widely considered to reflect this real-world scenario. However, here we show that the scaffold split also overestimates VS performance. The reason is that molecules with different chemical scaffolds are often similar, which hence introduces unrealistically high similarities between training molecules and test molecules following a scaffold split. Our study examined three representative AI models on 60 NCI-60 datasets, each with approximately 30,000 to 50,000 molecules tested on a different cancer cell line. Each dataset was split with three methods: scaffold, Butina clustering and the more accurate Uniform Manifold Approximation and Projection (UMAP) clustering. Regardless of the model, model performance is much worse with UMAP splits from the results of the 2100 models trained and evaluated for each algorithm and split. These robust results demonstrate the need for more realistic data splits to tune, compare, and select models for VS. For the same reason, avoiding the scaffold split is also recommended for other molecular property prediction problems. The code to reproduce these results is available at https://github.com/ScaffoldSplitsOverestimateVS
VQSynery: Robust Drug Synergy Prediction With Vector Quantization Mechanism
Wu, Jiawei, Yan, Mingyuan, Liu, Dianbo
The pursuit of optimizing cancer therapies is significantly advanced by the accurate prediction of drug synergy. Traditional methods, such as clinical trials, are reliable yet encumbered by extensive time and financial demands. The emergence of high-throughput screening and computational innovations has heralded a shift towards more efficient methodologies for exploring drug interactions. In this study, we present VQSynergy, a novel framework that employs the Vector Quantization (VQ) mechanism, integrated with gated residuals and a tailored attention mechanism, to enhance the precision and generalizability of drug synergy predictions. Our findings demonstrate that VQSynergy surpasses existing models in terms of robustness, particularly under Gaussian noise conditions, highlighting its superior performance and utility in the complex and often noisy domain of drug synergy research. This study underscores the potential of VQSynergy in revolutionizing the field through its advanced predictive capabilities, thereby contributing to the optimization of cancer treatment strategies.
TransCDR: a deep learning model for enhancing the generalizability of cancer drug response prediction through transfer learning and multimodal data fusion for drug representation
Xia, Xiaoqiong, Zhu, Chaoyu, Shan, Yuqi, Zhong, Fan, Liu, Lei
Accurate and robust drug response prediction is of utmost importance in precision medicine. Although many models have been developed to utilize the representations of drugs and cancer cell lines for predicting cancer drug responses (CDR), their performances can be improved by addressing issues such as insufficient data modality, suboptimal fusion algorithms, and poor generalizability for novel drugs or cell lines. We introduce TransCDR, which uses transfer learning to learn drug representations and fuses multi-modality features of drugs and cell lines by a self-attention mechanism, to predict the IC50 values or sensitive states of drugs on cell lines. We are the first to systematically evaluate the generalization of the CDR prediction model to novel (i.e., never-before-seen) compound scaffolds and cell line clusters. TransCDR shows better generalizability than 8 state-of-the-art models. TransCDR outperforms its 5 variants that train drug encoders (i.e., RNN and AttentiveFP) from scratch under various scenarios. The most critical contributors among multiple drug notations and omics profiles are Extended Connectivity Fingerprint and genetic mutation. Additionally, the attention-based fusion module further enhances the predictive performance of TransCDR. TransCDR, trained on the GDSC dataset, demonstrates strong predictive performance on the external testing set CCLE. It is also utilized to predict missing CDRs on GDSC. Moreover, we investigate the biological mechanisms underlying drug response by classifying 7,675 patients from TCGA into drug-sensitive or drug-resistant groups, followed by a Gene Set Enrichment Analysis. TransCDR emerges as a potent tool with significant potential in drug response prediction. The source code and data can be accessed at https://github.com/XiaoqiongXia/TransCDR.
Data-Driven Information Extraction and Enrichment of Molecular Profiling Data for Cancer Cell Lines
Smith, Ellery, Paloots, Rahel, Giagkos, Dimitris, Baudis, Michael, Stockinger, Kurt
With the proliferation of research means and computational methodologies, published biomedical literature is growing exponentially in numbers and volume. As a consequence, in the fields of biological, medical and clinical research, domain experts have to sift through massive amounts of scientific text to find relevant information. However, this process is extremely tedious and slow to be performed by humans. Hence, novel computational information extraction and correlation mechanisms are required to boost meaningful knowledge extraction. In this work, we present the design, implementation and application of a novel data extraction and exploration system. This system extracts deep semantic relations between textual entities from scientific literature to enrich existing structured clinical data in the domain of cancer cell lines. We introduce a new public data exploration portal, which enables automatic linking of genomic copy number variants plots with ranked, related entities such as affected genes. Each relation is accompanied by literature-derived evidences, allowing for deep, yet rapid, literature search, using existing structured data as a springboard. Our system is publicly available on the web at https://cancercelllines.org
Deep learning for drug response prediction in cancer
Predicting the sensitivity of tumors to specific anti-cancer treatments is a challenge of paramount importance for precision medicine. Machine learning(ML) algorithms can be trained on high-throughput screening data to develop models that are able to predict the response of cancer cell lines and patients to novel drugs or drug combinations. Deep learning (DL) refers to a distinct class of ML algorithms that have achieved top-level performance in a variety of fields, including drug discovery. These types of models have unique characteristics that may make them more suitable for the complex task of modeling drug response based on both biological and chemical data, but the application of DL to drug response prediction has been unexplored until very recently. The few studies that have been published have shown promising results, and the use of DL for drug response prediction is beginning to attract greater interest from researchers in the field. In this article, we critically review ...
Can artificial intelligence help identify best treatments for cancers? LSU researchers say yes
A team of LSU researchers has developed a way to determine which drug therapies work best against an individual's unique type of cancer, possibly providing a way to find cures more quickly and make treatment more affordable. The interdisciplinary team includes researchers from the School of Veterinary Medicine, College of Science, College of Engineering and the Center for Computation & Technology. It created CancerOmicsNet, a new drug discovery engine run by artificial intelligence. Using algorithms originally designed to map complex social networks, like those utilized by Facebook, researchers generated three-dimensional graphs of molecular datasets that include cancer cell lines, drug compounds and interactions among proteins inside the human body. The graphs are then analyzed and interconnected by AI, forming a much clearer picture of how a specific cancer would respond to a specific drug.
TCR: A Transformer Based Deep Network for Predicting Cancer Drugs Response
Gao, Jie, Hu, Jing, Sun, Wanqing, Shen, Yili, Zhang, Xiaonan, Fang, Xiaomin, Wang, Fan, Zhao, Guodong
Predicting clinical outcomes to anti-cancer drugs on a personalized basis is challenging in cancer treatment due to the heterogeneity of tumors. Traditional computational efforts have been made to model the effect of drug response on individual samples depicted by their molecular profile, yet overfitting occurs because of the high dimension for omics data, hindering models from clinical application. Recent research shows that deep learning is a promising approach to build drug response models by learning alignment patterns between drugs and samples. However, existing studies employed the simple feature fusion strategy and only considered the drug features as a whole representation while ignoring the substructure information that may play a vital role when aligning drugs and genes. Hereby in this paper, we propose TCR (Transformer based network for Cancer drug Response) to predict anti-cancer drug response. By utilizing an attention mechanism, TCR is able to learn the interactions between drug atom/sub-structure and molecular signatures efficiently in our study. Furthermore, a dual loss function and cross sampling strategy were designed to improve the prediction power of TCR. We show that TCR outperformed all other methods under various data splitting strategies on all evaluation matrices (some with significant improvement). Extensive experiments demonstrate that TCR shows significantly improved generalization ability on independent in-vitro experiments and in-vivo real patient data. Our study highlights the prediction power of TCR and its potential value for cancer drug repurpose and precision oncology treatment.
New algorithm will prevent misidentification of cancer cells
Researchers from Kent have developed a computer algorithm that can identify differences in cancer cell lines based on microscopic images, a unique development towards ending misidentification of cells in laboratories. Cancer cell lines are cells isolated and grown as cell cultures in laboratories for study and developing anti-cancer drugs. However, many cell lines are misidentified after being swapped or contaminated with others, meaning many researchers may work with incorrect cells. This has been a persistent problem since work with cancer cell lines began. Short tandem repeat (STR) analysis is commonly used to identify cancer cell lines, but is expensive and time-consuming.
Image-based phenotyping of disaggregated cells using deep learning
The ability to phenotype cells is fundamentally important in biological research and medicine. Current methods rely primarily on fluorescence labeling of specific markers. However, there are many situations where this approach is unavailable or undesirable. Machine learning has been used for image cytometry but has been limited by cell agglomeration and it is currently unclear if this approach can reliably phenotype cells that are difficult to distinguish by the human eye. Here, we show disaggregated single cells can be phenotyped with a high degree of accuracy using low-resolution bright-field and non-specific fluorescence images of the nucleus, cytoplasm, and cytoskeleton. Specifically, we trained a convolutional neural network using automatically segmented images of cells from eight standard cancer cell-lines. These cells could be identified with an average F1-score of 95.3%, tested using separately acquired images. Our results demonstrate the potential to develop an “electronic eye” to phenotype cells directly from microscopy images. Berryman et al demonstrate that disaggregated cells can be phenotyped with a high degree of accuracy from bright-field and non-specifically stained microscopy images using a trained convolutional neural network (CNN). This approach allows for the identification of cell types without the need for specific markers.
Variational Autoencoder for Anti-Cancer Drug Response Prediction
Dong, Hongyuan, Xie, Jiaqing, Jing, Zhi, Ren, Dexin
Cancer has long been a main cause of human death, and the discovery of new drugs and the customization of cancer therapy have puzzled people for a long time. In order to facilitate the discovery of new anti-cancer drugs and the customization of treatment strategy, we seek to predict the response of different anti-cancer drugs with variational autoencoders (VAE) and multi-layer perceptron (MLP).Our model takes as input gene expression data of cancer cell lines and anti-cancer drug molecular data, and encode these data with {\sc {GeneVae}} model, which is an ordinary VAE, and rectified junction tree variational autoencoder ({\sc JtVae}) (\cite{jin2018junction}) model, respectively. Encoded features are processes by a Multi-layer Perceptron (MLP) model to produce a final prediction. We reach an average coefficient of determination ($R^{2} = 0.83$) in predicting drug response on breast cancer cell lines and an average $R^{2} > 0.84$ on pan-cancer cell lines. Additionally, we show that our model can generate unseen effective drug compounds for specific cancer cell lines.